40 research outputs found

    A deep learning approach to bilingual lexicon induction in the biomedical domain.

    Get PDF
    BACKGROUND: Bilingual lexicon induction (BLI) is an important task in the biomedical domain as translation resources are usually available for general language usage, but are often lacking in domain-specific settings. In this article we consider BLI as a classification problem and train a neural network composed of a combination of recurrent long short-term memory and deep feed-forward networks in order to obtain word-level and character-level representations. RESULTS: The results show that the word-level and character-level representations each improve state-of-the-art results for BLI and biomedical translation mining. The best results are obtained by exploiting the synergy between these word-level and character-level representations in the classification model. We evaluate the models both quantitatively and qualitatively. CONCLUSIONS: Translation of domain-specific biomedical terminology benefits from the character-level representations compared to relying solely on word-level representations. It is beneficial to take a deep learning approach and learn character-level representations rather than relying on handcrafted representations that are typically used. Our combined model captures the semantics at the word level while also taking into account that specialized terminology often originates from a common root form (e.g., from Greek or Latin)

    Arabidopsis ULTRAVIOLET-B-INSENSITIVE4 maintains cell division activity by temporal inhibition of the anaphase-promoting complex/cyclosome

    Get PDF
    The anaphase-promoting complex/cyclosome (APC/C) is a multisubunit ubiquitin ligase that regulates progression through the cell cycle by marking key cell division proteins for destruction. To ensure correct cell cycle progression, accurate timing of APC/C activity is important, which is obtained through its association with both activating and inhibitory subunits. However, although the APC/C is highly conserved among eukaryotes, no APC/C inhibitors are known in plants. Recently, we have identified ULTRAVIOLET-B-INSENSITIVE4 (UVI4) as a plant-specific component of the APC/C. Here, we demonstrate that UVI4 uses conserved APC/C interaction motifs to counteract the activity of the CELL CYCLE SWITCH52 A1 (CCS52A1) activator subunit, inhibiting the turnover of the A-type cyclin CYCA2;3. UVI4 is expressed in an S phase-dependent fashion, likely through the action of E2F transcription factors. Correspondingly, uvi4 mutant plants failed to accumulate CYCA2; 3 during the S phase and prematurely exited the cell cycle, triggering the onset of the endocycle. We conclude that UVI4 regulates the temporal inactivation of APC/C during DNA replication, allowing CYCA2;3 to accumulate above the level required for entering mitosis, and thereby regulates the meristem size and plant growth rate

    Improving the translation environment for professional translators

    Get PDF
    When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side. This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project

    C-BiLDA extracting cross-lingual topics from non-parallel texts by distinguishing shared from unshared content

    No full text
    We study the problem of extracting cross-lingual topics from non-parallel multilingual text datasets with partially overlapping thematic content (e.g., aligned Wikipedia articles in two different languages). To this end, we develop a new bilingual probabilistic topic model called comparable bilingual latent Dirichlet allocation (C-BiLDA), which is able to deal with such comparable data, and, unlike the standard bilingual LDA model (BiLDA), does not assume the availability of document pairs with identical topic distributions. We present a full overview of C-BiLDA, and show its utility in the task of cross-lingual knowledge transfer for multi-class document classification on two benchmarking datasets for three language pairs. The proposed model outperforms the baseline LDA model, as well as the standard BiLDA model and two standard low-rank approximation methods (CL-LSI and CL-KCCA) used in previous work on this task.status: publishe

    A deep learning approach to bilingual lexicon induction in the biomedical domain

    No full text
    BACKGROUND: Bilingual lexicon induction (BLI) is an important task in the biomedical domain as translation resources are usually available for general language usage, but are often lacking in domain-specific settings. In this article we consider BLI as a classification problem and train a neural network composed of a combination of recurrent long short-term memory and deep feed-forward networks in order to obtain word-level and character-level representations. RESULTS: The results show that the word-level and character-level representations each improve state-of-the-art results for BLI and biomedical translation mining. The best results are obtained by exploiting the synergy between these word-level and character-level representations in the classification model. We evaluate the models both quantitatively and qualitatively. CONCLUSIONS: Translation of domain-specific biomedical terminology benefits from the character-level representations compared to relying solely on word-level representations. It is beneficial to take a deep learning approach and learn character-level representations rather than relying on handcrafted representations that are typically used. Our combined model captures the semantics at the word level while also taking into account that specialized terminology often originates from a common root form (e.g., from Greek or Latin).status: publishe

    A deep learning approach to bilingual lexicon induction in the biomedical domain

    No full text
    Abstract Background Bilingual lexicon induction (BLI) is an important task in the biomedical domain as translation resources are usually available for general language usage, but are often lacking in domain-specific settings. In this article we consider BLI as a classification problem and train a neural network composed of a combination of recurrent long short-term memory and deep feed-forward networks in order to obtain word-level and character-level representations. Results The results show that the word-level and character-level representations each improve state-of-the-art results for BLI and biomedical translation mining. The best results are obtained by exploiting the synergy between these word-level and character-level representations in the classification model. We evaluate the models both quantitatively and qualitatively. Conclusions Translation of domain-specific biomedical terminology benefits from the character-level representations compared to relying solely on word-level representations. It is beneficial to take a deep learning approach and learn character-level representations rather than relying on handcrafted representations that are typically used. Our combined model captures the semantics at the word level while also taking into account that specialized terminology often originates from a common root form (e.g., from Greek or Latin)
    corecore